[Cherry-Pick][Optimization] merge matmul and add (#6986)#7191
Merged
zoooo0820 merged 11 commits intoPaddlePaddle:release/2.6from Apr 9, 2026
Merged
[Cherry-Pick][Optimization] merge matmul and add (#6986)#7191zoooo0820 merged 11 commits intoPaddlePaddle:release/2.6from
zoooo0820 merged 11 commits intoPaddlePaddle:release/2.6from
Conversation
|
Thanks for your contribution! |
fastdeploy-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-09
📋 Review 摘要
PR 概述:将 UnquantizedLinearMethod 中的 matmul 和 add 操作合并为 linear 函数,以提升性能
变更范围:model_executor/layers/linear.py、tests/e2e/utils/
影响面 Tag:[Optimization] [OP]
📝 PR 规范检查
✅ PR 标题包含 [Optimization] 标签
✅ PR 描述包含 Motivation、Modifications、Usage、Accuracy Tests、Checklist
✅ PR 规范符合要求
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🔴 Bug | linear.py:91 |
对于 paddle 格式模型,paddle.nn.functional.linear 的 weight 形状可能不匹配 |
总体评价
PR 的性能优化意图清晰,但存在一个潜在形状兼容性问题需要验证或处理。
| f"bias must be 1D with size equal to the last dim of weight, " | ||
| f"but got bias.shape={bias.shape}, weight.shape[-1]={layer.weight.shape[-1]}" | ||
| ) | ||
| out = paddle.nn.functional.linear(x, layer.weight, bias) |
There was a problem hiding this comment.
🔴 Bug 对于 paddle 格式的模型,layer.weight 的形状为 [input_size, output_size],而 paddle.nn.functional.linear 期望的 weight 形状为 (output_size, input_size)。
根据代码分析:
torch格式:layer.weight形状为[output_size, input_size](在create_weights中转置)paddle格式:layer.weight形状为[input_size, output_size](未转置)UnquantizedLinearMethod的process_weights_after_loading会被跳过
对于 paddle 格式的模型,直接使用 paddle.nn.functional.linear(x, layer.weight, bias) 可能会导致形状不匹配错误。
建议:
- 验证
paddle格式模型的兼容性 - 如果不支持,考虑添加条件判断或注释说明
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
性能优化
Modifications
将UnquantizedLinearMethod中的matmul和add用linear替换。

带bias情况基本上有加速,不带bias情况小shape下性能有下降(主要是python层if等调度开销,linear内部实现也是matmul)。
Usage or Command
无
Accuracy Tests
精度保持一致
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.